MSDS420

Assignment 3: Chicago Crimes

Author : Atef Bader, PhD

Last Edit : 3/30/2019

image.png

Deliverables:

  • Submit a single zip-compressed file that has the name: YourLastName_Assignment_5 that has the following files:

    1. Your PDF document that has your Source code and output
    2. Your ipynb script that has your Source code and output

Objectives:

  • Use SQL to execute different queries to retrieve data from Chicago Crime dataset and Police stations dataset
  • Use Geospatial queries to locate police stations and gun related crimes (with arrest or no arrest) in every district on Choropleth map
  • Use Geospatial queries to provide descriptive stat for every district on Choropleth map
  • Use Geospatial queries to locate the Block that is the furthest (Maximum Distance) from the police station that has gun related crime resulted in arrest

Submission Formats :

Create a folder or directory with all supplementary files with your last name at the beginning of the folder name, compress that folder with zip compression, and post the zip-archived folder under the assignment link in Canvas. The following files should be included in an archive folder/directory that is uploaded as a single zip-compressed file. (Use zip, not StuffIt or any 7z or any other compression method.)

  1. Complete IPYNB script that has the source code in Python used to access and analyze the data. The code should be submitted as an IPYNB script that can be be loaded and run in Jupyter Notebook for Python
  2. Output from the program, such as console listing/logs, text files, and graphics output for visualizations. If you use the Data Science Computing Cluster or School of Professional Studies database servers or systems, include Linux logs of your sessions as plain text files. Linux logs may be generated by using the script process at the beginning of your session, as demonstrated in tutorial handouts for the DSCC servers.
  3. List file names and descriptions of files in the zip-compressed folder/directory.

Formatting Python Code When programming in Python, refer to Kenneth Reitz’ PEP 8: The Style Guide for Python Code: http://pep8.org/ (Links to an external site.)Links to an external site. There is the Google style guide for Python at https://google.github.io/styleguide/pyguide.html (Links to an external site.)Links to an external site. Comment often and in detail.

Descriptions and Requirement Specifications

Chicago Crimes

In his first state of the union address , president Trump mentioned Chicago violance 10 times Trump's State of the Union Address

Chicago has more homicides than New York and Los Angeles combined

Columnist Clarence Page wrote an article , published by the Chicago Tribune stated that the city of Chicago had more homicides in the past two years than New York and Los Angeles combined

Chicago Police Department

Chicago police department CPD issues and publishes on daily basis on its website crime alerts, and press releases for the different districts .

image.png

The CPD categoizes the crimes into 8 categories as follows:

image.png

Chicago Crimes Dataset

The CSV file for crimes dataset for the city of Chicago is obtained from the data portal for the city of Chicago. Here is the link for the city of Chicago data portal City of Chicago Data Portal

image.png

Loading the Dataset CSV file

Three datasets are need for this assignment:

  1. The Chicago police stations in every district
  2. The Boundaries.geojson data for district boundries
  3. The Crimes dataset

Lets load the CSV file into a DataFrame object and see the nature of the data that we have.

Complete description of the dataset can be found on Chicago city data portal.

Based on Trumps State of the Union Address and the article written by columnist Clarence Page and published by the Chicago Tribune, we are interested to retrieve the data for the past two years and perform different types of spatial queries.

There are few of these queries that we are interested in to help CPD and city of Chicago to plot on a Choroplteh map those districts that have highest gun crimes.

Here are examples of those types of queries:

  1. Plot on Choropleth map the districts and their Violent Crimes
  2. Plot on Choropleth map the districts and their Gun related crimes
  3. Which district is the crime capital of Chicago districts?
  4. What the crime density per district?
  5. Plot on Choropleth map those gun related crimes that resulted in arrests
  6. Plot on Choropleth map the gun related crime that is in the farthest Block from the policy stattion for every district

Packages you need to Connect PostgreSQL server to load and retrieve Crhicago Crime dataset from the database:

  1. psycopg2: for PostgreSQL driver
  2. area: to calculate the area inside of any GeoJSON geometry
  3. Folium: for Choropleth maps

Execute the pip install command from the command window to install the packages listed bove

image.png

Execute the following pip show command from the command line to get info about any package you installed. Make sure that the packages got installed into Anaconda library since we are using Anaconda tool; please see below the pakcage info

image.png

Since we are using PostGIS in our work, please read and bookmark Chapter 4. Using PostGIS: Data Management and Queries

In [1]:
import folium
from folium import plugins
from folium.plugins import MarkerCluster
import psycopg2
import csv
import pandas as pd
import json
from area import area

from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT 

IMPORTANT NOTES:

  • Use this version of the assignment if you don't want to install PostgreSQL server on your laptop/computer locally to experiment with database and tables creation
  • In this version of the assignment you will be connecting to DSCC PostgreSQL server that has the database and tables already created on the server.
  • You need to use your NetID and password for login and password to connect PostgreSQL server hosted on DSCC
  • For the psycopg2.connect statements listed below, you must provide your NetID and password in order to connect to PostgreSQL server hosted on DSCC
In [2]:
# To run the script on the complete dataset takes roughly 35 minutes to omplete.
# Use this data set for your final submission of your Assignment 3
# Uncomment the following line after you unit test your code and ready to run and submit your assignment on this dataset 

# db_connection = psycopg2.connect(host='129.105.208.229',dbname="chicago_crimes", user="YourNetID" , password="YourPassword")




# Use the following dataset for unit testing purposes only. It takes roughly 5 minutes to omplete.
# Comment the following line when you are done with your unit testing and ready to run your assignment on the complete dataset and submit your Assignment 

db_connection = psycopg2.connect(host='129.105.208.229',dbname="chicago_crimes_ut", user="pmw9564" , password="")


cursor = db_connection.cursor()

Chicago Crimes Dataset

The Crimes_2001_to_present.csv is downloaded from Chicago data portal and it has roughly 6.5 million records.

While working on this dataset, It is prudent to make a note of the following:

  1. Geospatial queries are very demanding for system resouces like CPU, Memory, and DISK
  2. We are interested in the data set of the past 2 years, and when you execute Geospatial type queries, please be advised that these queries slow down your machine.
  3. Running this script to work on the data of the past 2 years will require roughly 25 minutes to complete. And requires roughly 40 minutes to complete using the dataset of the past 5 years. And requires hours to complete on the entire dataset with at least 16GB memory.
  4. It is a good idea to take a slice (past two years) of the dataset and store it, that will help improve perfoamnce significantly especialy for SEARCH and SORT algorithms that are utilized by the database engine.

Algorithm Performance

  • Sort algorithms used by the database engines vary in performance between O($N log N$) and O($ N^{2} $) where $N$ is the size of the number

  • Search algorithms used by the database engines vary in performance between O($log N$) and O($ N $) where $N$ is the size of the number

Lets start executing different Queries

Query #1:

  • Calculate the total number of crimes in every district and plot that on Choropleth map
In [3]:
cursor.execute("SELECT district, count(district) from crimes GROUP BY district")
rows=cursor.fetchall()
In [4]:
crimes_per_district = pd.DataFrame(rows, columns=['dist_num','number_of_crimes'])
crimes_per_district['dist_num'] = crimes_per_district['dist_num'].astype(str)

crimes_per_district.head()
Out[4]:
dist_num number_of_crimes
0 24 412
1 11 1113
2 8 886
3 19 608
4 25 726
In [5]:
crimes_dataset = pd.DataFrame(rows)
In [6]:
crimes_dataset.head()
Out[6]:
0 1
0 24 412
1 11 1113
2 8 886
3 19 608
4 25 726
In [7]:
total_number_of_crimes_per_district_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
In [8]:
total_number_of_crimes_per_district_map.choropleth(geo_data="Boundaries.geojson", 
              fill_color='OrRd', 
              fill_opacity=0.5, 
              line_opacity=1,
              data = crimes_per_district,
              key_on='feature.properties.dist_num',
              columns = ['dist_num', 'number_of_crimes']
              )
//anaconda3/lib/python3.7/site-packages/folium/folium.py:415: FutureWarning: The choropleth  method has been deprecated. Instead use the new Choropleth class, which has the same arguments. See the example notebook 'GeoJSON_and_choropleth' for how to do this.
  FutureWarning
In [9]:
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()

for police_station in police_stations:
    police_station_location = (police_station[0],police_station[1])
    cursor.execute("SELECT district, count(district) from crimes where district= %s GROUP BY district",[police_station[2]])
    districts_crime_numbers = cursor.fetchall()
    for district in districts_crime_numbers:
        folium.Marker(location = police_station_location,popup=folium.Popup(html="District No : %s  has   Total Number of Crimes:%s" %district ,max_width=450)).add_to(total_number_of_crimes_per_district_map)
  • Lets plot the Choropleth map and notice the intensity of color on the different districts
  • The Blue POPUP represents the location of police station in the different districts in the map
In [10]:
total_number_of_crimes_per_district_map
Out[10]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Query #2:

  • Calculate the total number of violent crimes in every district and plot that in a table on Choropleth map

Well, we really need only the violent crimes per district, so we will filter only those crimes that we are interested in. Please note that we are not interested to plot property crimes, we are really after violent crimes and in particular Gun related crimes.

So for now, lets plot violent crimes on Choropleth map and later on we will filter only Gun related crimes

In [11]:
violent_crime_categories='THEFT','ASSAULT','ROBBERY','KIDNAPPING','CRIM SEXUAL ASSAULT','BATTERY','MURDER'
In [12]:
cursor.execute("SELECT district, count(district) from crimes where PRIMARY_TYPE in %s GROUP BY district",[violent_crime_categories])
rows=cursor.fetchall()
violent_crime_data=pd.DataFrame(rows, columns=['district_num','number_of_violent_crimes'])
violent_crime_data['district_num'] = violent_crime_data['district_num'].astype(str)
violent_crime_data
Out[12]:
district_num number_of_violent_crimes
0 8 488
1 11 407
2 24 216
3 19 368
4 25 362
5 4 400
6 14 309
7 3 387
8 17 220
9 20 147
10 22 217
11 9 312
12 7 367
13 10 362
14 1 624
15 5 293
16 18 605
17 2 349
18 16 252
19 15 291
20 6 457
21 12 430
In [13]:
violent_crimes_per_district_map= folium.Map(location =(41.8781, -87.6298),zoom_start=11)
violent_crimes_per_district_map.choropleth(geo_data="Boundaries.geojson", 
              fill_color='YlOrRd', 
              fill_opacity=0.5, 
              line_opacity=1,
              data = violent_crime_data,
              key_on='feature.properties.dist_num',
              columns = ['district_num', 'number_of_violent_crimes'],
              legend_name="VOILENT CRIME MAP"
              )
//anaconda3/lib/python3.7/site-packages/folium/folium.py:415: FutureWarning: The choropleth  method has been deprecated. Instead use the new Choropleth class, which has the same arguments. See the example notebook 'GeoJSON_and_choropleth' for how to do this.
  FutureWarning
In [14]:
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()

for police_station in police_stations:
    police_station_location =(police_station[0],police_station[1])
    cursor.execute("SELECT PRIMARY_TYPE, count(PRIMARY_TYPE) from crimes where district =%s AND PRIMARY_TYPE in %s GROUP BY PRIMARY_TYPE",[police_station[2],violent_crime_categories])
    data = cursor.fetchall()
    violent_crimes_per_district_df = pd.DataFrame(data, columns=['Description', 'Number of Violent Crimes'])
    header = violent_crimes_per_district_df.to_html(classes='table table-striped table-hover table-condensed table-responsive')
    folium.Marker(location=police_station_location, popup=folium.Popup(html="District Number %s - Violent Crimes %s" %(police_station[2],header))).add_to(violent_crimes_per_district_map)
In [15]:
violent_crimes_per_district_map
Out[15]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Query #3:

  • Calculate the total number of gun related violent crimes in every district and plot that in a table on Choropleth map

Lets first create a dataframe of gun crimes per district first to get an idea about the number of gun crimes per district

In [16]:
gun='%GUN%'
cursor.execute("SELECT district, count(district) from crimes where DESCRIPTION::text LIKE %s GROUP BY district",[gun])
districts_gun_violent_crimes = cursor.fetchall()
districts_gun_violent_crimes_df = pd.DataFrame(districts_gun_violent_crimes, columns=['dist_num','gun_crimes'])
districts_gun_violent_crimes_df['dist_num'] = districts_gun_violent_crimes_df['dist_num'].astype(str)
districts_gun_violent_crimes_df
Out[16]:
dist_num gun_crimes
0 24 9
1 8 49
2 11 69
3 19 11
4 25 43
5 4 50
6 14 12
7 3 57
8 17 7
9 20 5
10 22 24
11 10 58
12 9 39
13 7 67
14 1 6
15 5 44
16 18 14
17 2 34
18 16 12
19 15 34
20 6 63
21 12 38
In [17]:
districts_gun_violent_crimes_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
districts_gun_violent_crimes_map.choropleth(geo_data="Boundaries.geojson", 
              fill_color='YlOrRd', 
              fill_opacity=0.5, 
              line_opacity=1,
              data = districts_gun_violent_crimes_df,
              key_on='feature.properties.dist_num',
              columns = ['dist_num', 'gun_crimes'],
              legend_name="GUN CRIME"
              )
//anaconda3/lib/python3.7/site-packages/folium/folium.py:415: FutureWarning: The choropleth  method has been deprecated. Instead use the new Choropleth class, which has the same arguments. See the example notebook 'GeoJSON_and_choropleth' for how to do this.
  FutureWarning

Now, lets create a dataframe of the different types of gun crimes for every district and then plot it on Choropleth map

In [18]:
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district from police_stations where district!='Headquarters'""")

gun='%GUN%'
police_stations = cursor.fetchall()

for police_station in police_stations:
    police_station_location = (police_station[0],police_station[1])
    cursor.execute("""SELECT DESCRIPTION, count(DESCRIPTION) from crimes where district=%s and DESCRIPTION::text LIKE %s GROUP BY DESCRIPTION""",[police_station[2],gun])
    district_gun_violent_crimes=cursor.fetchall()
    district_gun_violent_crimes_df=pd.DataFrame(district_gun_violent_crimes, columns=['Description', 'Number of Gun Crime'])
    header = district_gun_violent_crimes_df.to_html(classes='table table-striped table-hover table-condensed table-responsive')
    folium.Marker(location=police_station_location,popup=folium.Popup(html="District No: %s GUN_Crime: %s" %(police_station[2],header) )).add_to(districts_gun_violent_crimes_map)
    
In [19]:
districts_gun_violent_crimes_map
Out[19]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Query #4:

  • Calculate the crime density per district
In [20]:
district=[]
tarea=[]

with open('Boundaries.geojson') as f:
    data = json.load(f)
    a = data['features']
    for i in range(len(a)):
        obj=a[i]['geometry']
        n= a[i]['properties']
        district.append(n['dist_num'])
        tarea.append(area(obj)/10000)

af=pd.DataFrame({'dist_num': district,'district_area_inHectares':tarea})
af['dist_num'] = af['dist_num'].astype(str)
final_data= pd.merge(af, crimes_per_district, on='dist_num', how='inner')
final_data['crime_density'] = round(final_data['number_of_crimes']/(final_data['district_area_inHectares']/100))
final_data
Out[20]:
dist_num district_area_inHectares number_of_crimes crime_density
0 17 2492.727155 386 15.0
1 20 1132.170216 254 22.0
2 19 2225.035732 608 27.0
3 25 2827.989237 726 26.0
4 14 1555.869965 508 33.0
5 7 1688.670732 745 44.0
6 3 1576.063931 695 44.0
7 4 7068.152865 790 11.0
8 6 2099.682124 844 40.0
9 22 3490.416073 442 13.0
10 5 3318.613379 635 19.0
11 24 1406.081387 412 29.0
12 16 8171.776367 473 6.0
13 8 5992.169760 886 15.0
14 18 1215.520046 869 71.0
15 12 2509.453028 700 28.0
16 11 1582.727274 1113 70.0
17 15 989.631393 530 54.0
18 10 2038.988883 779 38.0
19 1 1214.818895 854 70.0
20 9 3505.216898 585 17.0
21 2 1949.690970 602 31.0

Query #5:

  • Create Marker Clusters on Choropleth map for those gun related violent crimes that resulted in arrest (green icon) and those that didn't (red icon)
In [21]:
gun_crime_arrests_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
gun_crime_arrests_map.choropleth(geo_data="Boundaries.geojson", 
              fill_color='YlOrRd', 
              fill_opacity=0.5, 
              line_opacity=1,
              data = districts_gun_violent_crimes_df,
              key_on='feature.properties.dist_num',
              columns = ['dist_num', 'gun_crimes'],
              legend_name="GUN CRIME"
              )
//anaconda3/lib/python3.7/site-packages/folium/folium.py:415: FutureWarning: The choropleth  method has been deprecated. Instead use the new Choropleth class, which has the same arguments. See the example notebook 'GeoJSON_and_choropleth' for how to do this.
  FutureWarning
In [22]:
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district from police_stations where district!='Headquarters'""")
gun='%GUN%'

police_stations = cursor.fetchall()

marker_cluster = MarkerCluster().add_to(gun_crime_arrests_map)

for police_station in police_stations:
    police_station_location = (police_station[0],police_station[1])
    cursor.execute("""SELECT DISTINCT ON(caseno) caseno, block,DESCRIPTION, count(arrest), arrest,latitude, longitude from crimes where district=%s and DESCRIPTION::text LIKE %s GROUP BY caseno,block, DESCRIPTION,arrest, latitude, longitude""",[police_station[2],gun])
    crimes_per_district = cursor.fetchall()
    for crime in crimes_per_district:
        if crime[4]==True:
            folium.Marker(location=(crime[5],crime[6]),popup=folium.Popup(html="District No: %s <br> Description: %s <br> Block: %s" %(police_station[2],crime[2],crime[1])),icon=folium.Icon(color='green', icon='ok-sign'),).add_to(marker_cluster)
        else:
            folium.Marker(location=(crime[5],crime[6]),popup=folium.Popup(html="District No: %s <br> Description: %s<br> Block: %s" %(police_station[2],crime[2],crime[1])),icon=folium.Icon(color='red', icon='remove-sign'),).add_to(marker_cluster)

            
In [23]:
crimes_per_district[:3]
Out[23]:
[('JC100027',
  '017XX N MAYFIELD AVE',
  'UNLAWFUL POSS OF HANDGUN',
  1,
  True,
  41.91205157,
  -87.7728073),
 ('JC100191',
  '017XX N NEWCASTLE AVE',
  'UNLAWFUL POSS OF HANDGUN',
  1,
  True,
  41.91174566,
  -87.7960732),
 ('JC103615',
  '019XX N LA CROSSE AVE',
  'ARMED: HANDGUN',
  1,
  False,
  41.91638097,
  -87.74732207)]
In [24]:
gun_crime_arrests_map
Out[24]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Query #6:

  • Plot on Choropleth map the farthest Block that has a gun crime from every police station in every district
In [25]:
farthest_block_gun_crime_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
farthest_block_gun_crime_map.choropleth(geo_data="Boundaries.geojson", 
              fill_color='YlOrRd', 
              fill_opacity=0.5, 
              line_opacity=1,
              data = districts_gun_violent_crimes_df,
              key_on='feature.properties.dist_num',
              columns = ['dist_num', 'gun_crimes'],
              legend_name="GUN CRIME"
              )
In [26]:
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()

gun='%GUN%'

for police_station in police_stations:
    
    cursor.execute("""SELECT DISTINCT on (A.block) A.district,A.block, A.where_is,ST_Distance(A.where_is,B.where_is) from crimes as A, police_stations as B 
    where ST_Distance(A.where_is,B.where_is) in ( SELECT max(dist) FROM 
    (SELECT ST_Distance(A.where_is,B.where_is) as dist from crimes as A, police_stations as B where A.district=%s 
    and DESCRIPTION::text LIKE %s and B.district= %s ) as f)""",[police_station[2],gun,police_station[2]])
    
    farthest_block_gun_crime = cursor.fetchall()

    cursor.execute("SELECT ST_X(ST_AsText(%s)), ST_Y(ST_AsText(%s))",(farthest_block_gun_crime[0][2],farthest_block_gun_crime[0][2]))
    farthest_block_gun_crime_location = cursor.fetchall()
    folium.Marker(location=(police_station[0],police_station[1]),popup=folium.Popup(html="Police Station <br> District No.:%s <br> Farthest Gun_Crime Block:%s"%(farthest_block_gun_crime[0][0],farthest_block_gun_crime[0][1]))).add_to(farthest_block_gun_crime_map)
    folium.CircleMarker(farthest_block_gun_crime_location[0],radius=5,color='#ff3187',popup=folium.Popup(html="District No.:%s <br> Block:%s"%(farthest_block_gun_crime[0][0],farthest_block_gun_crime[0][1]))).add_to(farthest_block_gun_crime_map) 
    
In [27]:
farthest_block_gun_crime_map
Out[27]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Requirements

The PDF document your are submitting must have the source code and the output for the following requirements

Requirement #1:

  • Locate the Block that has the higest number of gun crimes. The popup on Choropleth map shall display the Block in every district along with the total number of gun crimes for that block
In [28]:
block_max_gun_crime_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
block_max_gun_crime_map.choropleth(geo_data="Boundaries.geojson", 
              fill_color='YlOrRd', 
              fill_opacity=0.5, 
              line_opacity=1,
              data = districts_gun_violent_crimes,
              key_on='feature.properties.dist_num',
              columns = ['dist_num', 'gun_crimes'],
              legend_name="Blocks with highest gun crimes"
              )

cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)),  district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()

gun='%GUN%'

for police_station in police_stations:
    police_station_location = (police_station[0],police_station[1]) # Police station table was created much earlier; we repeatedly use them; position [2] has distr
 
    query = f'''
    SELECT block, count(block)
    FROM crimes 
    WHERE crimes.DESCRIPTION ::text LIKE '{gun}' and crimes.district= {police_station[2]}
    GROUP BY block HAVING count(block) >= ALL(SELECT count(block) FROM crimes 
    WHERE crimes.DESCRIPTION::text LIKE '{gun}' and crimes.district = {police_station[2]} GROUP BY block)'''
    
    cursor.execute(query) 
    
    data=cursor.fetchall()
    #print(data)
    block_max_gun_crimes=pd.DataFrame(data, columns=['Block', 'Max Gun Crimes'])
    header = block_max_gun_crimes.to_html(classes='table table-striped table-hover table-condensed table-responsive')
    folium.Marker(location=police_station_location,popup=folium.Popup(html="District No: %s GUN_Crime: %s" %(police_station[2],header) )).add_to(block_max_gun_crime_map)
block_max_gun_crime_map
Out[28]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Requirement #2:

  • Calculate the gun crimes density in every district
In [29]:
gun='%GUN%'
cursor.execute("SELECT district, count(district) from crimes where DESCRIPTION::text LIKE %s GROUP BY district",[gun])
districts_gun_violent_crimes = cursor.fetchall()
districts_gun_violent_crimes_df = pd.DataFrame(districts_gun_violent_crimes, columns=['dist_num','gun_crimes'])
districts_gun_violent_crimes_df['dist_num'] = districts_gun_violent_crimes_df['dist_num'].astype(str)
districts_gun_violent_crimes_df

district=[]
tarea=[]

with open('Boundaries.geojson') as f:
    data = json.load(f)
    a = data['features']
    for i in range(len(a)):
        obj=a[i]['geometry']
        n= a[i]['properties']
        district.append(n['dist_num'])
        tarea.append(area(obj)/10000)

af=pd.DataFrame({'dist_num': district,'district_area_inHectares':tarea})
af['dist_num'] = af['dist_num'].astype(str)
final_data= pd.merge(af, districts_gun_violent_crimes_df, on='dist_num', how='inner')
final_data['gun_crime_density'] = round(final_data['gun_crimes']/(final_data['district_area_inHectares']/100))
final_data
Out[29]:
dist_num district_area_inHectares gun_crimes gun_crime_density
0 17 2492.727155 7 0.0
1 20 1132.170216 5 0.0
2 19 2225.035732 11 0.0
3 25 2827.989237 43 2.0
4 14 1555.869965 12 1.0
5 7 1688.670732 67 4.0
6 3 1576.063931 57 4.0
7 4 7068.152865 50 1.0
8 6 2099.682124 63 3.0
9 22 3490.416073 24 1.0
10 5 3318.613379 44 1.0
11 24 1406.081387 9 1.0
12 16 8171.776367 12 0.0
13 8 5992.169760 49 1.0
14 18 1215.520046 14 1.0
15 12 2509.453028 38 2.0
16 11 1582.727274 69 4.0
17 15 989.631393 34 3.0
18 10 2038.988883 58 3.0
19 1 1214.818895 6 0.0
20 9 3505.216898 39 1.0
21 2 1949.690970 34 2.0

Requirement #3:

  • Locate the farthest UNLAWFUL POSS OF HANDGUN crime from the police station in every district. The popup on Choropleth map shall display the district number and the block
In [30]:
UPH='%UNLAWFUL POSS OF HANDGUN%'
cursor.execute("SELECT district, count(district) from crimes where DESCRIPTION::text LIKE %s GROUP BY district",[UPH])
districts_UPH_crimes = cursor.fetchall()
districts_UPH_crimes_df = pd.DataFrame(districts_UPH_crimes, columns=['dist_num','UPH_crimes'])
districts_UPH_crimes_df['dist_num'] = districts_UPH_crimes_df['dist_num'].astype(str)
districts_UPH_crimes_df

farthest_block_UPH_crime_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
farthest_block_UPH_crime_map.choropleth(geo_data="Boundaries.geojson", 
              fill_color='YlOrRd', 
              fill_opacity=0.5, 
              line_opacity=1,
              data = districts_UPH_crimes_df,
              key_on='feature.properties.dist_num',
              columns = ['dist_num', 'UPH_crimes'],
              legend_name="Unlawful Possession of Handgun"
              )
//anaconda3/lib/python3.7/site-packages/folium/folium.py:415: FutureWarning: The choropleth  method has been deprecated. Instead use the new Choropleth class, which has the same arguments. See the example notebook 'GeoJSON_and_choropleth' for how to do this.
  FutureWarning
In [31]:
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district from police_stations where district!='Headquarters'""")
police_stations = cursor.fetchall()

UPH='%UNLAWFUL POSS OF HANDGUN%'

for police_station in police_stations:
    
    cursor.execute("""SELECT DISTINCT on (A.block) A.district,A.block, A.where_is,ST_Distance(A.where_is,B.where_is) from crimes as A, police_stations as B 
    where ST_Distance(A.where_is,B.where_is) in ( SELECT max(dist) FROM 
    (SELECT ST_Distance(A.where_is,B.where_is) as dist from crimes as A, police_stations as B where A.district=%s 
    and DESCRIPTION::text LIKE %s and B.district= %s ) as f)""",[police_station[2],UPH,police_station[2]])
    
    farthest_block_UPH_crime = cursor.fetchall()
    
    if farthest_block_UPH_crime != []:        
        cursor.execute("SELECT ST_X(ST_AsText(%s)), ST_Y(ST_AsText(%s))",(farthest_block_UPH_crime[0][2],farthest_block_UPH_crime[0][2]))
        farthest_block_UPH_crime_location = cursor.fetchall()
        folium.Marker(location=(police_station[0],police_station[1]),popup=folium.Popup(html="Police Station <br> District No.:%s <br> Farthest UPH crime Block:%s"%(farthest_block_UPH_crime[0][0],farthest_block_UPH_crime[0][1]))).add_to(farthest_block_UPH_crime_map)
        folium.CircleMarker(farthest_block_UPH_crime_location[0],radius=5,color='#ff3187',popup=folium.Popup(html="District No.:%s <br> Block:%s"%(farthest_block_UPH_crime[0][0],farthest_block_UPH_crime[0][1]))).add_to(farthest_block_UPH_crime_map) 
    else:
        pass
In [32]:
farthest_block_UPH_crime_map
Out[32]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Requirement #4:

  • Create Marker Clusters on Choropleth map for those gun related violent crimes that have Location Description as RESIDENCE in (green icon) and those that have Location Desciption as STREET in (red icon)
In [33]:
gun_crime_arrests_map = folium.Map(location =(41.8781, -87.6298),zoom_start=11)
gun_crime_arrests_map.choropleth(geo_data="Boundaries.geojson", 
              fill_color='YlOrRd', 
              fill_opacity=0.5, 
              line_opacity=1,
              data = districts_gun_violent_crimes_df,
              key_on='feature.properties.dist_num',
              columns = ['dist_num', 'gun_crimes'],
              legend_name="GUN CRIME"
              )
In [34]:
cursor.execute("""SELECT ST_X(ST_AsText(Where_IS)), ST_Y(ST_AsText(Where_IS)), district from police_stations where district!='Headquarters'""")
gun='%GUN%'

police_stations = cursor.fetchall()

marker_cluster = MarkerCluster().add_to(gun_crime_arrests_map)

for police_station in police_stations:
    police_station_location = (police_station[0],police_station[1])
    cursor.execute("""SELECT DISTINCT ON(caseno) caseno, block,DESCRIPTION, count(location_description), location_description,latitude, longitude from crimes where district=%s and DESCRIPTION::text LIKE %s GROUP BY caseno,block, DESCRIPTION,location_description, latitude, longitude""",[police_station[2],gun])
    crimes_per_district = cursor.fetchall()
    for crime in crimes_per_district:
        if crime[4]=="RESIDENCE":
            folium.Marker(location=(crime[5],crime[6]),popup=folium.Popup(html="District No: %s <br> Description: %s <br> Block: %s" %(police_station[2],crime[2],crime[1])),icon=folium.Icon(color='green', icon='ok-sign'),).add_to(marker_cluster)
        elif crime[4]=="STREET":
            folium.Marker(location=(crime[5],crime[6]),popup=folium.Popup(html="District No: %s <br> Description: %s<br> Block: %s" %(police_station[2],crime[2],crime[1])),icon=folium.Icon(color='red', icon='remove-sign'),).add_to(marker_cluster)

            
In [35]:
gun_crime_arrests_map
Out[35]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [ ]: